DNA gives instructions for functioning, growth, and reproduction of organisms: living and dead
Hofman et al. (2015) Trends in Eco. Evo. DOI: 10.1016/j.tree.2015.06.008
Carries much information about the past: ancestry, adaptation to different environments (e.g. diet, disease, etc.)
i.e., YOU CAN LEARN TOO!
@jfy133:
Currently funded by:
Cytosine, ThymineGuanine, AdenineC with G (think: CGI)A with T (think: AT-AT walker)C on one strand, G on the other (or v.v.)A on one strand, T on the other (or v.v.)C, get new G (etc)
Converting the chemical nucleotides of a DNA molecule
to
ACTG on your computer screen
More ‘second’ generation (see: Nanopore)
Replicate a strand, but add complementary fluorophore-modified nucleotide, one colour per base
Ju et al. (2006) PNAS DOI: 10.1073/pnas.0609513103
In Illumina: A G T C
Fire mah lazer, and take a picture! Rinse and repeat!
On a ‘flow cell’: glass slide with synthetic DNA ‘lawn’
Bronner et al. (2013) Current Protocols in Human Genetics, DOI: (10.1002/0471142905.hg1802s79)
But how do you get your DNA to attach to the lawn
(and not get lost)?
AATGATACGGCGACCACCACaccgacaaCCCTACACGACGCTCTTCCGATCTXXXXXXAGCACACGTCTGAACTCCAGTCACgacactaCCGTCTTCTGCTTG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTACTATGCCGCTGGTGGTGtggctgttGGGATGTGCTGCGAGAAGGCTAGAXXXXXXTCGTGTGCAGACTTGAGGTCAGTGctgtgatGGCAGAAGACGAAC
[Adapter & Index Primer] [Index] [Target primer] [Target] [Target primer] [Index] [Adapter & Index Primer]
Abizar Lakdawalla , CC BY 3.0, via https://openlab.citytech.cuny.edu/
EMBL-EBI Training, CC BY-SA 4.0, via https://www.ebi.ac.uk/training/
Remember: doing this millions of times at once!
N© 2021 Illumina, Inc. All rights reserved. Used here for training purposes only.
FASTQ format is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores. Both the sequence letter and quality score are each encoded with a single ASCII character for brevity. - Wikipedia
Example (files can be gigabytes in size!)
@K00233:37:HGHLYBBXX:3:1101:2646:1121 1:N:0:NACGCATC+NGCTAATG
NCGCATGAGCCGCCTGTATCAGGCGCTGATCGAACCGGGCATTGCAGTTGGGATAGATCGGAAGAGCACACGTCTG
+
#A7F<<AA<JFJFJJJJJJFFJJJJJJJAFFJFJJJJJJJFJAFFFJAJFJJ<FJJJJJFFF<FFA--FFFJJJJJ
@K00233:37:HGHLYBBXX:3:1101:4655:1121 1:N:0:NACGCATC+NGCTAATG
NATGCATGACAGGAGGTGAGGGCATTTTCCAGATTTTCAGGCTGCGACCTTGAGCATCTTTCGCCGCTTCCAGCAC
+
#AA-<FFFF7JFF7JJJJJFJJ<JJJJJA7FJJJJJJJFF<JFF<J7-<FJJJJFJFFJJJAAAAFFJJ--AJAJJ
@ <read id, e.g. machine ID, location on flowcell> <extra metadata>
<DNA sequence; Note: N = base couldn't be called!>
+ <a separator>
<base quality scores for each nucleotide in sequence>
Quality score
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJ
0.2......................26...31........41
ACTG)C-G, A-T)What is the command line?
A command-line interface (CLI) processes commands to a computer program in the form of lines of text. - Wikipedia
Remember this password - you won’t be asked for confirmation & you will re-use in Microbiome Data Analysis!
A command prompt (or just prompt) is a sequence of (one or more) characters used in a command-line interface to indicate readiness to accept commands. - Wikipedia
<username>@<machine_name>:<current_directory>$
$ is where you type your commandType in everything after the prompt, and press enter/return (⏎) on your keyboard with
Hello world!
-h or --help)What is in the room (directory)
Lets go in the directory, and see what’s in there!
How to go back?
We will run the nf-core/eager pipeline.
nf-core/eager is a scalable and reproducible bioinformatics best-practise processing pipeline for genomic NGS sequencing data, with a focus on ancient DNA (aDNA) data. It is ideal for the (palaeo)genomic analysis of humans, animals, plants, microbes and even microbiomes.
Pipeline (software): a chain of data-processing processes or other software entities
Fellows Yates et al. (2021) PeerJ. DOI: 10.7717/peerj.10947
Github copy: dag-material/<Intro to NGS>/assets/files/multiqc_report_testtsv_eager2_2_0.html